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THE DISTRIBUTION OF MAXIMA OF APPROXIMATELY 
GAUSSIAN RANDOM FIELDS 

By Yuval Nardi, 1 David O. Siegmund 2 ' 3 and Benjamin Yakir 3 

Carnegie Mellon University, Stanford University and Hebrew University 

Motivated by the problem of testing for the existence of a signal 
of known parametric structure and unknown "location" (as explained 
below) against a noisy background, we obtain for the maximum of a 
centered, smooth random field an approximation for the tail of the 
distribution. For the motivating class of problems this gives approxi- 
mately the significance level of the maximum score test. The method 
is based on an application of a likelihood-ratio-identity followed by 
approximations of local fields. Numerical examples illustrate the ac- 
curacy of the approximations. 

1. Introduction and summary. There are two central themes in this pa- 
per. One is the development of a method for the derivation of analytic ap- 
proximations for the tail of the distribution of the maximum of a smooth ran- 
dom field. Such random fields and the distribution of their maxima emerge 
naturally in a variety of statistical applications, for example, brain mapping 
or searching for hot spots of disease in space and/or time. See, for example, 
[10, 11, 12, 13, 14, 15, 16, 19]. 

The other theme involves the detailed investigation of a specific case, 
which is asymptotically Gaussian but where direct application of results for 
Gaussian fields does not seem adequate. This field arises in the context of 
testing for the presence of a signal of a given parametric structure within a 
noisy image. The image is composed of an array of pixels. The effect of a 
signal at a given pixel depends on the distance between the signal and the 
pixel. A score statistic is constructed for each candidate signal and an overall 
test statistic for the presence of some signal is obtained by maximizing the 
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score over the collection of all candidate "locations." We will consider second 
order approximations for the tail of the distribution of the test statistic under 
the null hypothesis of the absence of a signal — the significance level of the 
test. Similar methods can be applied to obtain an approximation for the 
power. In Section 5 we indicate how the method may also be adapted to 
other models, including (under different technical conditions) the frequently 
discussed case of smooth Gaussian fields, and how it can be used to obtain 
higher order approximations. 

We begin by describing the random field of interest. Assume that there 
is a process {W u : u G A n } over some space. These \A n \ = n observations are 
mutually independent with distributions F u . The collection A n indicates 
the locations of pixels, which may or may not be regularly spaced, and the 
index u denotes the location of a particular pixel. The process of interest 
is X t = E« e ^„ 6u(t)W u , t£T. We assume that 9:A n xT^ R + is a known 
real-valued function and that T is a nice subset of the ci-dimensional Eu- 
clidean space. In typical applications t is a parameterization of a putative 
signal and 9 u (t) represents the effect of W u on that signal. The case where 
T is a rectangle with sides parallel to the coordinate axes, t indexes sub- 
rectangles, also with sides parallel to the coordinate axes but having varying 
location and dimensions, and 9 u {t) is the indicator that u belongs to t was 
discussed by Siegmund and Yakir [14]. In this paper we consider the case 
where U is a smooth function of t £ T for each u. In an example that we 
consider later, t is a line segment joining the left and right sides of the unit 
square and 9 u (t) is a decreasing function of the distance from the point u to 
the line t. In this example, large values of W u for u close to t lead to large 
values of X t and indicate the presence of the signal t, while values of W u for 
u far from t have less effect on Xt. 

We embed the distributions of G T in an exponential family with a 
natural parameter £ by assuming that the W^s obey an exponential distri- 
bution laws of the form: 

(1.1) dF u (w u )=exp{tO u {t)w u -^ u {te u (t))}dF u (w u ), £eR,teT. 

Throughout the paper we use the convention that derivatives with respect 
to £ are denoted by apostrophes, while derivatives with respect to t by dots. 
We assume that the distributions have been standardized so that 

(1.2) ^«(0) = 0, <(0) = 0, <(0) = 1. 

We can formulate the null hypothesis of no signal as Hq :£ = for all t, 
while under the alternative £ = £j > for some value of t. In terms of the 
exponential embedding, the log-likelihood for fixed t is given by 

(1.3) l n (t,0 = {tfu(t)W u - M^u(t))}. 
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Differentiating the log-iikelihood (1.3) twice with respect to £ and substitut- 
ing £ = 0, we obtain the standardized score statistic at a fixed t GT, which 
by virtue of (1.2) is given by 

(1-4) Z n (t)= l -^== £ P u , n (t)W u , 

where I n (t) = J2 u eA„ @u(t) is the Fisher information and /3 u , n (i) = u (t)/ 
[.^(t)] 1 / 2 . A (one-sided) test statistic is obtained by maximizing Z n (t) over 
T. Since the value of t giving rise to the signal is unknown, we consider as a 
test statistic max t Z n (£). The associated p-value is given by the probability: 



(1.5) P ( sup Z n (t)>x 

\t£T 

computed under the assumption that £ = 0. 

Consider the random field {Z n (t),t £ T}. For each fixed t, when £ = 
the random variable Z n (t) is asymptotically standard normal. The sample 
points of this field are real valued functions on T, which are smooth functions 
of t, since the {0 u (t)} are. The main result that we would like to establish 
is that when x = o(n 1 / 4 ), the probability (1.5) can be approximated, up to 
a term inside the braces that is o(l/x) by 

x d - 1 {2Ti)- d l 2 (^{x) 

- s ^\An(t)\ 1/2 (l-r 2 n (t)/2al(t))dt 

1/2 



T 




b 








X 


/ 

IdT 



(1.6) 



- & ^\\K{t)\ ■ (g(t),KHt)g(t))) 1/2 /\\g(t)\\dv dT (t) 

The indicated approximation involves powers of the threshold x and the 
standard normal density <fi. It also involves integration of functions of t 
denoted by A n (i), 5 n (t), r n (t), a^(t) and g(t), which are defined below. 
Angular brackets "(■,■)" correspond to the inner product of vectors and 
"|| • ||" to the Euclidean norm. 

The region T is defined with the aid of constraint functions: 

(1.7) T = {t eR d :gi(t) <0,l<i<m}. 

These functions are generically denoted g(t), with gradient vector g(t). The 
boundary of the region, dT, corresponds to values of the parameters, in 
general a manifold, where g(t) = 0. The differential dVgx denotes the volume 
element of the {d — 1 dimensional part of the) boundary of T. 
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The other functions in (1.6) are functionals of Z n (t), the value of the field 
at t, and of Z n (t), the random gradient of the field at that point. These 
functionals are computed under the alternative distribution for W u given by 
(1.1) with the amplitude 

(1-8) i t = i t ^ x = xl- l ' 2 {t). 

Specifically, A n (i) is a positive-definite matrix defined by 

h Q , a ( ., _ EueAju(t)®e u (t)} _ [i n (t)®i n (t)] 

{ ' n(j " In(t) *W) ' 

where "<8>" is the outer (Kronecker) product. The matrix A n (t) is (asymptot- 
ically) the covariance matrix of Z n (t). It is related to the expected value of 
the Hessian of Z n (t) under the tilted measure by J2 U <=A„ Pu,n(t)ift' (£tQu(t)) = 
-xk n {t) + 0{x 2 n- l l 2 ). 

The term 5 n (t) is the difference between the actual log-likelihood and the 
log-likelihood of the approximating normal distributions: 

(1.10) 5 n (t) = l n (t,Ct)-[xZ n (t)-x 2 /2]=x 2 /2- J2 MitO u (t)). 

This quantity is deterministic and is of the order of magnitude 0(x 3 n~ 1 ^ 2 ). 
Another measure of discrepancy from the normal limit is r n (t), which is the 
difference between the (tilted) expectation of Z n (t) and the threshold x: 

(1-11) r n (t) = £ /3 u , n m' u (Mt)) - x. 

u£A„ 

This term is of the order of magnitude of 0{x 2 n~ 1 / 2 ). Finally, the term cr^(t) 
is given in terms of the d x d covariance matrix of the gradient: 

UGAn 

and the correlation between Z n (t) and Z n (t): 

Pn(t) = £ [/M#«,n(*M(6M*)) 

UGAn 

by 

(1.12) o 2 n {t) = l-(p n {t)^-\t)p n {t)). 

Observe that in the Gaussian limit Z(t) and Z(t) become independent, and 
hence cr n (t) converges to one. 

Sufficient regularity conditions for (1.6) are stated in Theorem 4.9 below. 
In Section 2 we outline the principles of the method for producing expansions 
of the probability (1.5) in smooth random fields. Section 3 presents some 
numerical examples. In Section 4 we provide details regarding the approxi- 
mation of the significance level, with some proofs deferred to an Appendix. 
Section 5 discusses some extensions. 
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Remarks, (i) There is a large related literature in the case of Gaussian 
fields, for example, [3, 4, 7, 13, 16, 17]. See [1] for a review up to 2000 
and additional references, and [18] for an outstanding recent contribution. 
To our knowledge the non-Gaussian case is relatively unexplored, and even 
then the published results are for relatively simple concrete problems, and 
are derived heuristically. See, for example, [11] and [10]. 

(ii) In the first integral in (1.6), the differential |A n (t)| 1 / 2 dt is easily rec- 
ognized as the volume element for a manifold with metric tensor A n , exactly 
as one knows it must be from the familiar case of a Gaussian random field. 
Similarly, one knows from the Gaussian case that the appropriate differen- 
tial in the second integral (i.e., the product of all factors in the integrand 
except the factor e~^ n ^) must equal the volume element for the boundary 
of the manifold with metric tensor A n . It seems easy to prove this result in 
special cases, although difficult in general. [E.g., one can show the equiva- 
lence when the boundary of T is given (locally) by sets like the set where 
g(t) =td — f(t±, . . . ,td-i) equals 0, for a suitable smooth function /.] How- 
ever, since application of equation (1.6) does not rely on this interpretation, 
we do not pursue the argument here. 

(iii) In exponential change of measure arguments, one often chooses the 
parameter, here £, so that under the new distribution the process of interest 
has expectation exactly equal to the threshold x. Since this would result in a 
nonlinear equation for £, we find it simplifies some Taylor series expansions 
to use a slightly different value, which has an explicit form, and would be 
the conventional value if the process were exactly Gaussian. 

2. Outlining the method. Our method for expanding probabilities of the 
form (1.5) is based on a application of a likelihood ratio identity, followed 
by local expansions. It is very similar to the approach that has been applied 
in previous work, which was, however, related to fields of Brownian-motion 
type. See [14, 15], and [20]. However, in the current work some steps have 
been modified in order to exploit the smoothness of the random paths. The 
application of the methodology is split into six building blocks, which are 
described below. Details are given in Section 4. 

Measure transformation: The first step involves a likelihood ratio identity. 
This transformation allows us to recenter the analysis in a setting where the 
probability of crossing the threshold is substantially larger and where the 
central limit theorem is more likely to be applicable. 

Localization: The likelihood ratio identity produces a functional of the 
random field. Conditioning on the signal, that is, the parameter point se- 
lected by the alternative distribution, it is argued that the value of the 
functional is determined mainly by the local behavior of the field about that 
point in the parameter space, so the original field can be replaced by a finite 
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order Taylor polynomial and the computation of the functional restricted to 
a smaller subset of the parameter space. 

Application of a central limit theorem: The local term that emerges from 
the previous step is a functional of the value of the process at the signal 
location and of derivatives of the field at that point. The functional is ap- 
proximated by a similar functional applied to an appropriate multinomial 
distribution. 

Elimination of the indicator: The functional has an exponential compo- 
nent, a component in the form of an indicator and components that are 
essentially polynomial in the derivatives of the random field. By condition- 
ing on these derivatives and using a Mill's ratio type of approximation, one 
can eliminate the indicator. 

Evaluation of the functional: In theory this step, which involves approxi- 
mation of the functional of the derivatives by polynomials and the evaluation 
of the expectation of the resulting Gaussian polynomials, is straightforward. 
In practice, it is tedious to apply if higher order approximations are sought. 
It is at this stage that boundary effects become significant. 

Integration over the parameter space: The analysis in the last four steps 
is carried out at the signal location. A final integration of the functionals 
over the parameter space must be carried out to produce an approximation 
to the probability of interest. 

3. Simulation studies. In this section we examine via simulations the ac- 
curacy of (1.6) for one example of an approximately Gaussian random field. 
The simulations were programmed using the C++ language. Probabilities 
were approximated using 5,000 iterations of the simulations, which corre- 
sponds to results accurate up to the second digit after the decimal point. 

Consider a background field of independent Bernoulli random variables 
covering the standard unit square on a fixed dyadic grid A n = {(i/2 m , j/2 m ), 
< i,j < 2 m }, so n = (2 m + l) 2 . A signal is composed of a straight line that 
passes between a point on the intersection of the unit square with the ver- 
tical line x = and a point on its intersection with the vertical line x = 1. 
This figure can be imagined to be a very primitive "edge" in a two dimen- 
sional image. More interesting examples along these lines would involve, say, 
broken lines with some maximum number of breaks at unspecified positions. 
The parameter space T is again the unit square, with each parameter point 
representing the vertical levels of the leftmost and the rightmost points of 
intersection, respectively. For the function 9 u (t), which measures the close- 
ness of a point u G An to the signal t E T, we use 9 u (t) = exp[— D ■ d(u,t) 2 /2], 
where d(u,t) = mino< p <i \\u — [p(0,ii) + (1 — _p) (1, ^2)] || and D is a positive 
scale parameter. The random variables W u were taken to be Bernoulli with 
probability of success po = 0.1. When a signal is present, this probability is 
shifted to pi =Po/\po + (1 - Po)e~^ 9u( -^}. 
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Table 1 

Comparison between a Gaussian approximation (pg) and the approximation (pe) based 
on (1.6). Empirical significance level (p) was estimated from 5,000 iterations (SD 

e [0.0015,0.003]) 





D 


= 10 






D 


= 20 






D 


= 50 




X 


P 


Pe 


PG 


X 


P 


Pe 


Pg 


X 


P 


Pe 


Pg 


2.5 


0.046 


0.036 


0.026 


2.8 


0.044 


0.043 


0.023 


3.1 


0.042 


0.088 


0.024 


2.6 


0.036 


0.029 


0.020 


2.9 


0.034 


0.035 


0.018 


3.2 


0.036 


0.075 


0.018 


2.7 


0.029 


0.024 


0.016 


3.0 


0.030 


0.029 


0.014 


3.3 


0.034 


0.064 


0.013 


2.8 


0.021 


0.020 


0.012 


3.1 


0.021 


0.024 


0.010 


3.4 


0.024 


0.054 


0.010 


2.9 


0.015 


0.016 


0.009 


3.2 


0.015 


0.019 


0.008 


3.5 


0.020 


0.046 


0.007 


3.0 


0.013 


0.013 


0.007 


3.3 


0.014 


0.016 


0.006 


3.6 


0.013 


0.040 


0.005 


3.1 


0.009 


0.010 


0.005 


3.4 


0.012 


0.013 


0.004 


3.7 


0.010 


0.034 


0.004 



Table 1 compares, with m = 5 (n = 1,089), the simulated tail probability 
(p) with the analytic approximation (p E ) in (1.6), and a parallel approxi- 
mation based on treating the field as Gaussian (pa), for which (1.6) is used, 
but with 5 n = = r n . Three values, D = 10, 20, 50 and a range of thresholds 
x corresponding to p- values that vary between 0.05 and 0.01 were examined. 
Overall, when the values of the scale parameter are not too large (D = 10, 20) 
the approximations given by (1.6) produce good results and are more ac- 
curate than an approximation based on the Gaussian limit. When D = 50, 
only a small fraction of the background observations contribute to the score 
statistic, so the marginal distribution of Z(t) is poorly approximated by a 
normal distribution. The Gaussian approximation is anticonservative, while 
(1.6) overcompensates and is very conservative. It would be interesting to 
know whether the large deviation like approximation suggested in [10] can 
be adapted to deal with this case. 

In Table 2 we compare the approximated p- values for m = 5 (n = 1,089) 
versus m = 6 (n = 4,225) when D = 17. Since the accuracy of the approxi- 
mations is comparable in both cases one is tempted to conclude that (1.6) is 
stable with respect to the sample size in the range where the approximation 
is valid. 

4. Detailed proofs. In this section we add the details to the outline that 
was presented in Section 2. Throughout, we will try to keep the discussion 
as general as possible in order to lay the foundation for the extension of the 
method to other models and to higher order approximations. 

4.1. Measure transformation. We begin by transforming the null proba- 
bility measure under which the field is centered and the probability in (1.5) 
is relatively small to an alternative probability. 
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Put a uniform prior over the parameter space T. Let Pf = P$ £ t correspond 
to the probability measure for which, given t, the random variables {W u } 
are (conditionally) independent and are distributed according to (1.1) with 

— 1/2 

parameter £t# u (t), where £ t = x/„ (t) was defined in (1.8). Let ¥ x be the 
unconditional distribution of {W u }, so 



(4.1) 



p*(0 



1 



A(T) 



for A(T) the Lebesgue measure of T. 

The log-likelihood ratio of Pf relative to P is given by (1.3) with £j in 
place of £. Hence, the likelihood ratio of P^ relative to P can be written as 



(4.2) 



dF, 



1 



5 J»(«,6) d s . 



dP A(T) J T 
Applying a likelihood ratio identity, one obtains 

1 



sup Z n (t) >x) = 
teT J Jt 



; sup Z n (s) > x 



dt. 



Simple algebraic manipulations lead to the representation in the form 

- e -x{Z n (t)~x) 



(4.3) = x d e~ x2/2 



-<5n(*)]g 



T 



;x(Z n (t)-x)+logM>0 



dt, 



where for each t, M and S are defined in terms of the random field Xt(s) = 
Xt,x,n(s) = x ( z n(s) - Z n (t)),s G T, by the equations M = sup sgT exp[X t (s)} 
and S = J seT x d exp[X t (s) + S n (s) — 6 n (t)] ds, with 6 n (-) as defined in (1.10). 



Table 2 

Comparison of the approximation (pe) between m = 5 and m = 6 when D = 17. 
Empirical significance level (p) was estimated from 5,000 iterations (SD 

G [0.0015,0.003]) 



m = 5 m = 6 













X 


P 


PE 


P 


PE 


2.6 


0.049 


0.054 


0.045 


0.043 


2.7 


0.040 


0.044 


0.037 


0.034 


2.8 


0.032 


0.036 


0.031 


0.027 


2.9 


0.029 


0.030 


0.020 


0.022 


3.0 


0.020 


0.024 


0.019 


0.017 


3.1 


0.019 


0.020 


0.015 


0.013 


3.2 


0.013 


0.016 


0.009 


0.010 


3.3 


0.009 


0.013 


0.006 


0.008 
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4.2. Localization and a Taylor approximation. In this section we con- 
centrate on the integrand in representation (4.3). The term M produced by 
maximization and the term S produced by integration will be replaced by 
similar, but more tractable, expressions. The replacement is two-fold. First, 
a subset of the parameter space is used instead of the complete parameter 
space. This subset, which we denote by Vt, is a neighborhood of t with radius 
depending on the threshold x and on the sample size n. The resulting local 
random field is denoted by Xt = {Xt(s),s G Vt}. Second, this local field is 
replaced by its Taylor approximation about t (to a certain order). Denote 
X t = {X t (s),s S Vt} to be the random field generated by these two approxi- 
mations. The main subject in this subsection is to show that by taking the 
radius of Vt to be of an appropriate order and by taking enough terms in 
the Taylor expansion of the field, one may control the error involved up to 
a pre-prescribed level of accuracy. 

For r > 0, let B(r) be the d-ball of radius r, as defined by the Euclidean 
metric. Denote the elements of any d-dimensional vector by subscripts, for 
example, s = (s±, . . . , Sd). For every fe-tuple ii,...,ik from {l,...,d}, and 
a d-dimensional vector x let a^,...,^ = Ylj^x^. For every function on T 
and on T x T we denote partial derivatives using superscripts. For example, 
flun(t) 18 the second order partial derivative of /3 u ,n(s) with respect to Si,Sj, 
evaluated at s = t. Throughout, we use the Einstein summation convention 
(cf. [5], pages 136-137). 

We will consider the case where the information regarding £ at t, I n (t) = 
J2ueA„@u(t)i * s °f the order of magnitude of the sample size n. This as- 
sumption clearly holds for a functions 9 which are uniformly bounded and 
bounded away from zero over A n x T. We assume henceforth that uniform 
boundedness holds both for and for its derivatives, up to an appropriate 
order. We also assume boundedness of ip and its derivatives. 

By (1.4) we can decompose Z n (s) as 

(4.4) Z n (s) = U n (s) + ]T 

u£An 

where U n (s) = J2ueA„ Pu,n(s){W u - ip' u (£tOu(t))) ■ Observe that U n {s) is a 
centered variable under the probability measure Pj. The second component 
is deterministic. The investigation of the localized field Xt(s) = x(Z n (s) — 
Z n (t)) amounts to expanding f3 u ,n( s ) m a neighborhood of t. One needs to 
evaluate the derivatives of P u ,n(t) explicitly in order to proceed with the 
analysis. The derivatives are incorporated separately with respect to the 
random part and the deterministic part. 

Consider in detail the first two derivatives in conjunction with the de- 
terministic part. Other derivatives (random and deterministic) will have, by 
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the choice of the neighborhood, less of an effect on the overall value of Xt(s). 
Direct computation gives 

Pu,n{S)- , , V U \S), 

In (s) In (s) 

■A f s_ 0u(s) Ju{s)®J n (s) 
Pu,n\S) — i/o z " 



/n /2 (s) I^ /2 ( S ) 

?«(s) j n {s) u (s)J n (s) <g) J„(s) 



I^ /2 ( S ) /* /2 ( S ) ' 

where J n (s) = J2u&A n G u (s)6 u (s) [so that I n (s) = 2J n (s)]. 

In the following we let 0(y) denote a function that is bounded above and 
below by expressions of the form const x y. 

By assumption, the first derivative term Eugi i3u,n(^'te^ti(i)) is °f 
the order of magnitude of 0(x 2 n -1 / 2 ), and is denoted by r±. The second 
derivative term, J2 U £An Pu,n{t)ip'(£tOu(t)), can be written as — xA n (t) + r2, 
where r 2 is an 0(x 2 n _1//2 ) term, and A n (t) is a 6(1), nonnegative definite 
matrix, given in (1.9). 

Let a > and set 

(4.5) V t = t®An 1/2 (t)B(logx/x). 

Each element of the approximating field {X t (s),s S Vt} is defined via a finite 
expansion: 

2 

X t (s) = x(s - t, U n {t)) - y (s - i, A n (i)(s - i)) + n + r 2 
(4-6) +E^( s -^i,...X 1 ""* 4fe (i) 

fc=2 fc - 

fc=3 ' U€-4 n 

In the case a = 0, only the first line of (4.6) is used. For a = 1, which is 
relevant for the results presented in Section 1, one term from the second line 
and one term from the third line of (4.6) are also required. 
Denote the remainder of the Taylor expansion by 

(4.7) r t {s) = r t>x>n ( S ) = X t (s) - X t (s). 

Define 

So = f x d e x ^+ s "^- s "V ds, M = sup e x ^ s \ 

Js&Vt s£Vt 
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So 



x d e Xt(s)+6 n (s)-S n (t) dg 



eVt 



M = sup e Xtis) . 

s&Vt 



We also set X = x(Z n (t) — x). The main result of this subsection is: 

Theorem 4.1. A ssume that T is compact. Suppose further that u ('*j 
belongs to C Q+3 , and that all derivatives up to this order are bounded from 
above in Vt- Assume also that ip u {') "is three times differentiate with bounded 
derivatives over an open interval that contains the origin. Finally, assume 
that I n (t) is @(n), the smallest eigenvalue of A n (t) is bounded away from 
zero, and the largest eigenvalue is finite. Take e= Q(x~ a ). Then 



E, 



-X 



S 



-;X + logM>0 



-x 



;X + logM>0 



<e £ E t 



> 



-x 



So 



■;X + logM >-e 



+ o(x~ 



l + e 



-Et 



So 



■;X + logM > e 



+ o(x- a ). 



Remarks. In the proofs given in this subsection we do not mention 
the contribution of S n (t) explicitly. In many interesting cases, which include 
for example the classical medium deviation of a normalized sum of i.i.d. 
variables, one considers threshold levels x that satisfy x = o(n 1 / 6 ). For such 
x, 5 n (s) is a remainder term which tends to zero as x and n tend to infinity. 
Therefore, ignoring it throughout the various lemmas below will present no 
loss of generality. In other applications, where x may tend to infinity slightly 
faster, for example, x = o(n 1 / 4 ), an appropriate choice of the constant C in 
Lemma 4.2 and the bound over Mq/Sq, given in (4.17) below allow one to 
dominate S n (s) and proceed with the proof. We reintroduce 6 n (s) — 5 n (t) in 
Section 4.5, where its exact contribution becomes relevant. 



Proof of Theorem 4.1. Roughly speaking, the theorem states that 
the error committed by switching between S,M and So, Mo, respectively, is 
sufficiently small for our subsequent calculations. Therefore, one must make 
sure that (i) the remainder is small, and (ii) the contribution of quantities 
outside Vt are negligible. To this end, the region T \ Vt is covered with K 
balls of radius 1/x 2 . Let V{, for 1 < i < K, denote a generic ball centered 
at Ti G T \ Vt- Let Si and Mj denote the analogues of So and Mo for such 
a ball. Note that M = max{Mo, Mi, . . . , Mk} and S > max{5o, Si,..., Sk}- 
Since Mo > 1, we see that Mj > 1 on {M = Mj}. The remainder of the proof 
consists of obtaining suitable upper and lower bounds. 

The upper bound is obtained by a localization argument and a Taylor 
approximation (in that order): 



E t 



-x 



X + logM>0 
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\e~ x 


<E t 


S 

K 




i 


=1 


<E t 


[ So 


= K t 


. So 



;X + logM>0,M = M 

;X + logAf >0,M = Mj 



-A 



5 



;X + logM >0 



i=l 



f;M,>i 



+ E t 



<e £ E, 



K 



;X + logM >0,sup|rt(s)| <e 

s&Vt 



e~ x 

— ;X + logMo>0,sup\r t (s)\>e 

JO s&Vt 



K 



i=l 



L &i 



-X 



So 



■;X + logM > -e 



+ IE, 



— ;sup|r t (s)| >e 



+ E E * 



i=l 



Mi 

5; 



-,M>1 



The lower bound is obtained by going the other way: 
-x 



E, 



So 



■;X + logM >e 



IE/ 



-A 



-x-; X + log M > e > sup |r t (s)| 

O0 sG^t 



+ IE, 



<e e Ey 



-A 



-^;X + logM >£,sup|r 4 (s)| >e 



-A 



— ;X + logM >e,sup|r t (a)| <e 

*0 s€V t 



+ e~ £ E t 



-x-; sup \r t (s)\ > e 
£>o s&Vt 



-x 



<e £ E t 



e £ E 



•Si 



-;X + logM >0 



+ e~ £ E t 



- 7r ; sup \r t (s)\ > e 

So s&Vt 



-X 



Sa 



■;X + logM >0,S<(l + e)So 



+ E t 



So 



;S>(l + e)S Q 
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+ e~ £ E t 



Mi 



; sup \r t (s)\ > e 



So seVt 
-x 



<e e (l+e)E f 



S 



-;X + logM>0 



+ e £ E t 



M n 



■;S>(l + e)S 



+ e~ £ E t 



-7T-; sup |r t (s)| > e 

OQ S&Vt 



Therefore, 



-x 



> 



-;X + logM>0 



1 + e 



-x 



l + e 



So 

Mo 
So 



;X + logM >£ 
S>(l + e)S 



-2c 



l + e 



-E t 



-g-;sup|r t (s)| >e 

OQ s£Vt 



The proof proceeds through a sequence of lemmas, which show that the 
various error terms are small. Many statements along the way will hold 
true only up to a constant factor. We use D,Di,D2, and so on, to denote 
such (positive) constants. Occasionally the same symbol denotes different 
constants. □ 

We start the sequence of lemmas by putting on record the elementary fact 
that the expectation of a nonnegative random variable over an event can be 
controlled by the tail of its distribution: 

Lemma 4.2. For any nonnegative random variable Y , any measurable 
set A, and any positive and finite C, 

POD 

(4.8) E[Y;A]<CF[A]+CF[Y>C}+ / P[Y > y] dy. 

Jc 

Proof. The proof is elementary and is omitted. □ 

The second general lemma relates the maxima of a given (deterministic) 
function to the integral of the same function. 

Lemma 4.3. Let h : B(r) — > M be a continuously differentiate real-valued 
function over a closed ball of radius r. Let H = max zeB ( r ) ||M Z )II • Then, for 
some positive constant D, 



(4.9) 



/ exp< h(y) — max h(z) > dy> (H + D/r)~ 
JB(r) I z£B(r) ) 
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Proof. The proof is given in the Appendix. □ 



The next lemma takes us back to the specific terms analyzed in Theo- 
rem 4.1: 



Lemma 4.4. Assume the conditions of Theorem 4.1. Then 
M 



(4.10) E t 
and the same holds for Mq / So . 



, ;sup|n(s)| >£ 
^0 seVt 



o(x~ a ), 



Proof. The proof is based on Lemma 4.2. Set A = {sup se y t |r^(s)| > e}, 
and C = n Dlogn , for some positive constant D. We begin by showing that 



(4.11) 



n Dlognp t 



sup \r t (s)\ > e 
seVt 



o{x~ a ). 



Recall that U%>~^{s) = T.u & A n Pl\n' lk {s)(W lL - ^ tt (&0«(t))). We write 

briefly /?$(•) and ui k \-), or even /3 u>n (-), U n (-) when no confusion is likely. 
The remainder is absolutely and uniformly bounded in Vt by 



Di(logx) a+2 x- (a+1) 



(4.12) 



max sup\ut +2 \s)\ 

1<H,... ,I a +2<0 S 6Vt 



+ D 2 (logx 



a+3-(a+l) 



since U {-) and its derivatives are uniformly bounded by assumption. 
The bound (4.12) on the remainder leads directly to: 



(4.13) 



sup \r t (s)\ > e 



sup|[4 a+2) (s)|> Dx(logx)- {a+2) 

seVt 



where the sum expands over {1, . . . , d}( a+2 ^ . Note that the inequality is valid 
for any x which is greater than D(log:r)( a+3 ). The probability on the right 
hand side of (4.13) is at most the sum of similar tail probabilities of the 
random field U n (s) and its negation. These probabilities differ only by a 
constant. We bound the tail of a bounded field by the expectation of an 
exponentiated field. Such expectations are investigated next. 

Write Ft(-) for the probability distribution function, under the alternative 
probability P t , of the collection {W u ,u S A n }. The expectation 
Et [sup sg ^ t exp{C/ n (s)}] is given, upon dividing and multiplying by 
f v exp{U n (s)} ds and using Fubini's theorem, by 



(4.14) 



sup seVt e 



U n (s) 



f Vi e u »( s )ds 



x e Un{r) dF t (w)dr, 
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where the innermost integral (or a sum for discrete models) is with respect 
to the sample space. Using a suitable exponential tilting, obtained by con- 
sidering the cumulant generating function of {W u } under Pf, we have that 
(4.14) 



(4.15) 



x E, 



Vi 



sup sg y t e- 
J Vt e u ^ ds 



dr, 



for A(r,t)=VU6^) + /3& +2) (^ 

that the probability distribution P r , or equivalently its distribution function, 
belongs as does the probability Pj to an exponential family. The exact form 
of the family may be written explicitly but is not essential for the proof. 
Taking one additional term in the Taylor expansion for ipu(^u(t) + 

Pu^n~ 2 \r)) we can write A n (r,t) = ^(^n^ir)) 2 ^"^), for some point ■# close 
to £,tO u {t). Then, since ip" are uniformly bounded, we get an upper bound 
on (4.15): 

AW- 



n 



E, 



J Vt e u ^ ds 



dr. 



(4.16) 

Define h(y) = U n {t + \kn 1/2 {t)y) and B = B{\ogx). The expectation in (4.16) 
is, by Lemma 4.3, smaller than or equal to the expectation under P r of 

{H + D/\ogx)\ for H = max zeB \\±An 1/2 {t)U n (t + ±An 1/2 {t)z)\\. It is suffi- 
cient to consider E r (H d ), or even E r (J2 u€An \W U - <(&0 u (i))|) d . B Y inde- 
pendence, and since W£ are integrable with respect to P r , the expectation 
exhibits a growth which is at most polynomial. Assertion (4.11) now follows 
from Chebyshev's inequality. 

Consider next the tail of Mq/Sq. In the application of Lemma 4.3 we may 

identify B by B(logx) and h(y) by xZ n (t + ^A~ 1/2 (t)y). Then S /M is in 
the required form: | A n (t)| -1 / 2 J B exp{h(y) — max 2g s h(z)} dy. It follows that 



(4.17) 

where H < Dmax s£ v t 
equality, 



^<D(H + D/logx) d , 
\Z n (s)\\ < Dn-^EueAnlWu]- By Chebyshev's in- 



(4.18) 



n Dlogn 



c2 — y 



dy 



<Dn~ Dlogn E t (Dn^ 2 ^ \W U \ + D/\ogx 
V ueAn 



2d 



Clearly, both CF t [M /S Q > C], and Jj? 
quested. 



[Mq/Sq > y] dy are o(x a ) as re- 
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The tail behavior of Mq/Sq can be managed almost identically. Let E 
denote the bound (4.12) on the remainder. Hence, Mq /So < e 2= Mo/So, and 
the problem may be reduced to the evaluation of J n C Di OE »Pi(e 2 ' = > y 1 ^ 2 )dy, 
and J^iogn F t (Mo/So > y 1 ^ 2 ) dy. The latter is clearly in the appropriate 
order of magnitude [raise each side of inequality (4.17) to the power of 4]. 
The former can be managed by raising each side to the power of 8, say, 
and using the method taken in the evaluation of the expectation of the 
exponentiated field. □ 

Lemma 4.5. Under the conditions of Theorem 4.1, 
-M 



(4.19) E< 



„ ;S>(l + e)S 



o(x- a ). 



Proof. The proof's structure is patterned after Lemma 4.2. This time, 
we set A = {S > (1 + e)So} and take as before C = n ologn . Since the radius 
of a generic ball Vj, is x~ 2 we obtain that the total number of balls that 
are needed in order to cover T \ 14 is proportional to x 2d . However, since 
the elementary inclusion {Y^i=l^i > u} ^ UiLii^i > UPi} holds for any se- 
quence of nonnegative numbers p\,...,p n that add up to 1 and any sequence 
Yi,...,Y n of nonnegative random variables, it is sufficient to evaluate the 
probability of \jS*{Si > e Pi S }. We take Pi = x~ 2d /D. 

A lower bound on So may take the form: 

So > DeyLv\-D(\ogx) 2 x- 1 max ||17 n (a)|| 1. 

As for Si, we decompose X t (s), given the central point Tj, as: 

X t (s) = x(Z n (s) - Z n (ji)) + x{Z n (ji) - Z n (t)). 

The first component is bounded by (loga;)x _1 max^g^ ||C/ n (s)|| +0(1). The 
second component may be decomposed further to produce 

(4.20) x{u n {n) - u n (t)) + xY, WuAn) - /M*M(£A(*))- 

The sum in (4.20) is equal to x(J2 U £A n @u,n( T i) Pu,n{t) — 1), up to an 0(x 2 n~ 1 / 2 ) 
term. But, J2ueA n Pu,n{Ti)/3u,n(t) 1S (asymptotically) the correlation, under 
the transformed measure ¥ t , between U n {Ti) and U n (t), and as such, it is ab- 
solutely bounded by unity. It turns out that the second component in (4.20) 
is a negative multiple of x 2 . 

Gathering the various random and deterministic terms, one sees that to 
complete the proof it is sufficient to bound the terms Pt(max se y i ||C/n(s)|| > 
x 3 /logx), P t (max se y t \\U n {s)\\ > x z /\og 2 x), and ¥ t {U n {n) - U n (t) > Dx). 
The probabilities involving the maxima over 14 and 14 can be handled by 
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the methods used in Lemma 4.4. Notice that the thresholds in this case are 
even larger than the ones we were using before. The probability associated 
with U n {ji) — U n {t) decays exponentially. This can be verified, for example, 
by Chebyshev's inequality since the exact form of the cumulant function of 
W u is known. □ 

Lemma 4.6. Under the same conditions as before, 



(4.21) J2 K 



K ' Mi 



t 

i=i 



o(x- a ). 



Proof. The handling of Pt(Mj > 1) may be carried out similarly to 
Lemma 4.5 by using the decomposition of Xt(s) considered there. The quo- 
tient Mi /Si can be analyzed using Lemma 4.3. □ 

Lemma 4.7. Under the same conditions as before, 



(4.22) Et 



e -x(Z n (t)-x) 



So 



;x(Z n (t) -x) + log M >0 



< oo. 



Proof. The investigation of the tail behavior of Mq/Sq shows that it is 
integrable with respect to Pf. This is sufficient in order to prove (4.22). □ 

4.3. Normal approximation. Let U n = U n {t) = (uji\t), . . . , U n a+l) {t)f 
denote the vector of partial derivatives of U n {t), and set U n = U n {t) = (U n (t), 

^n(i))*- We treat U n k \ the fcth order partial derivatives, as a ( fc+ f -1 )- 
dimensional vector and lA n is a large column vector. The expectations we 
would like to evaluate as a result of Theorem 4.1 are of the form E, t [G(U n )], 
for an appropriate function G. 

The target at this stage is to show that 

(4.23) E t G(U n ) = E t G(U)P(U) + o(x~ Q ), 

for U = U{t) = {U{t),N{t)Y , a normally distributed (under F t ) random vec- 
tor with zero mean and variance-covariance matrix denoted by C n (see 
below) and for P an appropriate polynomial. Equation (4.23) corresponds 
to an application of a central limit theorem to a sum of independent vectors 
that are not identically distributed. If U n possesses a density then one may 
use Theorem 19.3 of [6] in order to expand the density to the required ac- 
curacy and get the result. If U n is discrete, as is the case in the numerical 
example we presented, different tools are required. 

We give below an argument corresponding to a = 1 and general distri- 
butions with a finite forth moment. We shall assume that (a) the functions 
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an d ^w(') an d their derivatives are bounded (see Section 4.2), and 
that (b) the smallest eigenvalue of the covariance matrix C n is bounded 
away from zero. Note that for a = 1 we have that P = 1. 

Let d n = .Dlogn. The basic strategy we apply in this subsection involves 
three steps. In the first step, the expectation is evaluated separately on 
two complementary events, {||W n || < d n } and {||^/ n || > d n }, with the latter 
shown to be negligible. The second step utilizes bounds for errors of normal 
approximation in order to establish (4.23) for expectations restricted to the 
first event. As a result, U n is replaced by IA. The third step reverses step 
one, but in the normal setting. The first step uses techniques similar to those 
applied in the proof of Lemma 4.4. In [9] one can find a discussion of the 
expansion of the distribution of the maximum for Gaussian random fields, 
which involves some modification of the tools that are used in empirical 
random fields. As a corollary of that discussion, it will be possible to produce 
a parallel of Lemma 4.4 and to prove the third step. Details for the first and 
the third steps are omitted. Henceforth, we concentrate on the second step. 

We shall need the following notation. Let 6 n n 
where j3 u n {t) is considered as a vector, and thus U n {t) = J2u n(t) [W u — ip' u ] , 
for < = V4(&0u(*))- Define C n = £ u Cov (b u , n (t)[W u - <]), and let 

X u = X u , n (t) = n l l 2 C~ l l 2 b u , n {t)[W u - V<J. 

Let V„(t) = EuXu = n^ 2 Cn 1/2 U n (t) be the sum of the independent vectors 
and let Q n be the distribution of W„(i) = n" 1 / 2 V n (t) = Cn l/2 U n {t). 
With those notations in mind, write 

E t [G(U n ); \\Un\\ < d n ] - Et[G(U); \\U\\ < d n ] 

(4.24) 

' f(w)d(Q n (w)-$(w)), 



1/2 

where $ is the Gaussian measure, and f( W ) = G{Cn' w)L nnl/2 .... v 

The term in (4.24) may be bounded with the aid of Theorem 13.3 of [6], 
which states that 



J fd{Q n -<$>) < W/ 0R fc )a 3 (A:)p4n- 1/2 



(4.25) 

where 03 (k) is a constant that depends only on k, the dimension of U n and, 
for pi = n' 1 J2 u M x u\\\ a = 3,4. The term Uf(A) = sup{\f(x) - f(y)\:x,y£ 
A} is the modulus of oscillation of the function / over A and, for e > and 
a measure p, 



U4 



e;p)= sup / u!f(B(e) + x — y)p(dx). 

y£R k J 
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The fourth moment is bounded by n\\C n l ^ 2 \\ A Yl<u II I | 4 ^t[W« — V4] 4 - 
Under the assumption regarding the boundedness of 9 U {-), tp u (-), and their 
derivatives, we have that n^ u \\b u ^ n {t)\\ 4 ¥. t \W u — ifj 1 ^ 4 = Q(l). Moreover, 

||C n 1 ^ 2 || 4 = (A m i n (C n ))~ 2 , where A m i n (C n ) is the smallest eigenvalue of C n , 
which is bounded away from zero. Thus, the (averaged) fourth moment is 
finite. 

In the assessment of wj and uj*j- it is enough to consider /(u) for u = 
C n w, since C n is an 0(1) matrix. For such u, 

/(u) < e £ -^I {M < dn} . 
bo 

We regard here lA n as fixed at the value u= (u,u,u). Direct maximization, 
and the use of elementary inequalities [such as F^^^^y 2 ) > e~ e2 ^ 2 F x 2^(y 2 ), 
which features a relationship between the c.d.f. of a noncentral chi-squared 
distribution and a central one] lead to 

I a ic\ t s n «'A; 1 (t)ti/2 , ^ n 1/4 t 

(4.26) -^I {M < dn} <De u i l^uWK^n} < Dn ' /{|| u ||<d„}- 

Do 

Therefore, / is bounded, and the first term in the right-hand side of (4.25) 
is of the order of mag nitude of Oin' 1 ' 4 ). 

Consider next wj. Let A be any subset of R fc . The modulus of oscillation 
of the indicator function, I a, over the ball B{e) + x, is 

(4.27) w* lA (e-<5>)= sup / I {aA y (x - y)<P(dx) = sup <5>{(d(A + y)) £ ), 

where A 6 is the set of points whose distance from A is less than e. Applying 
Corollary 3.2 of [6] (with s = and p = e) we obtain that for a convex set 
A, Wj a (e;Q) < b(k)e, where b(k) is a constant depending on k only. Hence, 
since the set {\\u\\ < d n } is convex and / is bounded, we conclude that 

w}{e; $) < Dn x l%k)e = O^" 1 / 4 ). 
This completes the proof that (4.25) is bounded by 0(n _1//4 ) = o{l/x). 

4.4. Elimination of the indicator. This section proceeds the analysis of 
the two (compactly written) terms, 

E t [e- x /S ;X + logM >±e], 

which originated from the analysis in the previous sections. In order to 
remove the dependency on the event, we proceed in two steps. The first step 
involves a conditioning argument, where the conditioning here is with respect 
to the (T-field generated by the derivative components N. These components 
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generate the local random field. Note that both So, and Mo are measurable 
with respect to this cr-field. The second step presents a Mill's ratio type 
of an approximation of the innermost expectation. Recall that (U,t>() are 
jointly Gaussian, hence the conditional distribution of U, given the cr-field, 
is normal. We apply a simple lemma which is proved in the Appendix: 

Lemma 4.8. Let Y ~ N(fi,a 2 ). Suppose x — > oo and y — > 0, such that 
xy converges to a constant. Then, 



E[e- xY ;Y>y] 



e -xy 



y-l* \ . y (~l) m (2m)! 

' rn=0 



X + 



a 2 



-(2m+l) 



+ r, 



where 4>(-) is the standard normal density function andr = o([x + ^r-] ( 2fe + 2 )). 

Let Tt = o~(pt) denote the cr-field generated by the fcth order partial deriva- 
tives of the (centered) field at t, for 1 < k < a + 1 and for some a < 2. Let 
Z(t) = U(t) + x be a random variable which, under ¥ t , is normally dis- 
tributed with mean x and variance Vai t (U(t)). Let p = Cov(Z(t),t>() and 
£ = Cov($/) [recall that C n = Cov(U)]. Observe that the conditional mean 
of Z(t) — x is 

(4.28) n = E t (Z(t)-x\F t ) = {p,Z- 1 M), 

while the conditional variance is cr 2 = Vait{Z{t) — x\!Ft) = 1 — {p^~ 1 p)- 
Therefore, conditioning on Tt, taking y = (— log Mo =t e)/x — r n (t) and ap- 
plying Lemma 4.8 we obtain, up to a o(x~ a ) remainder, 



E, 



e~ x 



■■X + \ogM >±e 



So 

M e- l l {2a ' 2 \p + (logMo/x) t e/a; + r n (t)) 2 



e* £ 



(4.29) = Et 

x 



S V^ 2 
L ^ J [(-l) m (2m)!]/[x 2m cj 2m 2 m m!] 



x 



f (1 - [m/x + (logMo/x 2 ) T e/x 2 + r n (t)/x]/cr 2 ) 2m + 1 



The difference r n (t) = Y, u€An ^,ti( { )t(^uW) ~ x was defined in (1.11). 

Although the expression in square brackets is pretty involved, note that 
the dependency on the event is removed and the result is an expectation of 
a function of the derivatives. The next section shows a general algorithm to 
produce an explicit approximate evaluation of this expectation. 
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4.5. Evaluation of the functional. The expression in the square brackets 
on the right-hand side of approximation (4.29) has four significant terms: 
Mo, So, 4>{—\ogMo/xa — \ijo — r n (t)/a), and the finite sum of rational 
functions. In order to obtain an approximation with error term of the order 
of magnitude of o(x~ a ) we show below how each term can be expanded 
as power series in 1/x. Clearly, e can be ignored in what follows. Except 
for the second term, So, the three other involve the maxima M> We leave 
the discussion about So to the end of the section, and commence with the 
maxima. 

For each t, the function log Mo is the maximum, subject to constraints, 
of a polynomial. The coefficients are functions of the random and determin- 
istic derivatives. The constraint is the neighborhood Vt of t. The dominant 
part in the polynomial Xt(s) are the linear component associated with the 
random gradient and the quadratic component associated with the deter- 
ministic Hessian. We ignore here the issue of the constraint, since it does not 
affect the evaluation to the order of accuracy considered here, and proceed 
with the unconstrained maximization. We reintroduce maximization under 
constraints when we deal with boundary effects, for which constraints are 
significant and may be active. 

The change of variable y = x(s — t) will produce the representation 

f(y) = X t (y/x + t) 

a 

= (y,U(t)) - ^(y,A n (t)y)+J2 x ~ j l9j+i(y,U)+h j+2 (y,(3)}, 

where 

fiid/.r) .</; ,J" ■■(') ./!• 

h j (y,P)=yi 1 ,..., ij £ pi]r ij ^t0u(t))/xjl 

UGAn 

The remainder terms r\,T2 that appear in (4.6) are discarded in the above 
but may be introduced. In general, the level of accuracy a determines 
whether remainder terms (which depend both on x and on n) will affect 
the final result or not. Let f(y) = q(y) + r{y), where q{y) is the dominant 
quadratic part, and r(y) is the remainder. We denote the maximum value 
log Mo by f(y*), for y* the maximizer. In order to approximate this maxi- 
mum we use a quasi Newton-Raphson algorithm (see, e.g., [8]). This entails 
finding a point y which approximates the location of the maximum of f(y) 
up to a certain order. This point, in turn, will also be used to bound the 
difference f(y) - f(y*). 

To be more specific, take yo = 0, y\ = A~ 1 (i)^7(t), and define recursively 

y k+ i = y k + K\t)f{yk) = K\t)u{t) + K\t)r{yk). 
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Plugging this back into / produces 

f(y k+1 ) = ±(U(t),A-\t)U(t)) 

The incremental increase is 

f{yk+i) ~ f(yk) 

= (r(Vk) ~ r(yk-i),K\tMy) - (r(Vk-l) + %fc))/2]>, 

for some y on the line segment connecting y k and y k ~\. 

By running the iterative process \_a/2\ +1 times, one gets that the dif- 
ference f(y) — f(y*) is o(x~ a ), as required. By the form of the function /, 
and the definition of the recursive Newton-Raphson sequence, it is clear 
that f(y) is also a polynomial, which can be rearranged according to powers 
of 1/x. The coefficient of the polynomial are sums of products of both the 
random and deterministic derivatives. 

The finite sum, which appears on the right-hand side of expression (4.29), 
is clearly a functional of the derivatives only; it contains /i, which is a linear 
function of the derivatives vector N, and log Mo- After substituting /(y) for 
log Mo, this sum is expressible as power series. Terms with smaller order of 
magnitude than x~ a can be ignored. This way, a finite representation in the 
form of a polynomial may be obtained. 

Another term involves the normal density. A Taylor expansion of the ex- 
ponent function, combined with the polynomial representation of log Mo, 
enables us to rewrite the term as ^(f)(p/a) multiplied by a polynomial, with 
coefficient which are functions of the derivatives. In the final evaluation of 
the expectation, the normal density —cf){p/a) may be absorbed back into 
the joint normal density of the partial derivatives vector H. This leads to 
another normal density function, easily recognized as the conditional den- 
sity of U given {Z(t) = x}. To see this, note that E t (N\{Z(t) = x}) = 0, 
Yaxt(t>l\{Z{t) = x}) = E — p® p, and that the following two relations hold: 



2 



+ 



i-( P ,s-V) 
{U,[{^-p®p)- x ]U) 



N 



and 



det(E - P ®p) = det(S) • (1 - (p, £~~V» = a 2 det(S). 



Finally, we consider the ratio between Mo and So, which is given by the in- 

— 1/2 

tegral of exp{/(y) — f(y*) + 5(t + y/x) — 5(t)} over the region •I' 
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By centering about y±, the first point in the Newton-Raphson series, we ob- 
tain the following approximation: 

(2vr) rf / 2 |A n (t)r 1 / 2 

x jl + E Y [r(Y) - r(jft) + 6(t + y/x) - 5(t)] 

+ • • • + ±E Y [r(Y) - r( yi ) + 8(t + y/x) - <5(t)] a ), 
a! J 

for y ~ N(yi, A~ 1 (t)). The integrands, (r(y) — r(yi))" 1 , are polynomial in 
with coefficients that are polynomials of the partial derivatives. The 
expectation of moments of polynomials will produce again polynomials. The 
term 8{t + y / x) — 5(t) should be expanded about t up to the required order 
of accuracy. Specifically, if we let x = (n 1 ^), for 4 < v < 6, then, conditional 
on Y = y, we have the expansion 

(4.30) | l (y,5(t)y) | | 1 '" ( ), : ,„ (t)_ 

x 2 x 2 ml x m 

where m = \3 + a — v/2\ — 1. For u > 6 (and any given accuracy level a), 
S(t + y/x) — 5{t) may be approximated by zero since S(t) is of the order of 
magnitude of 0(x 3 n~ 1 ^ 2 ). 

Incorporating all the approximating expressions for the four terms men- 
tioned earlier (by standard polynomial multiplication and division), one ob- 
tains an a-degree polynomial in 1/x. The coefficients of the polynomial are 
of the form of a sum of products of the partial derivatives. The expectation 
of these products, with respect to the conditional distribution of N, can be 
handled using Wick's formula (see Adler [2]). 

4.5.1. Boundary effect. Up to this point the derivation of the terms asso- 
ciated with the local behavior of the random field ignored the boundedness 
of the set of parameters T. This is justified at points t for which the local 
neighborhood 14 is a subset of T. However, the approximation as presented 
above does not apply at points near the boundary, where Vj extends be- 
yond T. In this section we outline the modifications that allow a rigorous 
expansion in the vicinity of the boundary. Indeed, substituting V% fl T for 
Vt and walking once more the path that brought us to this point, one can 
observe that the proofs hold, essentially word for word, up to Section 4.5. 
No more than simple regularity conditions regarding the smoothness of the 
boundary are needed. Divergence in the details of the argument occur in 
Section 4.5, which deals with the presentation of the relevant term in the 
form of a conditional expectation of a function of the derivatives of the lo- 
cal process. Again, this function is composed of the elements that resulted 
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from the Mill's ratio type of approximation and the ratio between the maxi- 
mum of the exponentiated local field and the integral of that field. However, 
maximization and integration now occur only over the constrained region 

v t nr. 

Recall that the parameter set was represented in (1.7) in the form T = 
{t 6 M. d :gi(t) < 0, 1 < i < m}, for a finite collection of smooth constraint 
functions. Here we develop the algorithms for obtaining approximations of 
the constrained maximum and integral for this representation. In particular, 
in this section we will assume that t = t(x) is such that x ■ gj(t) converges 
to a negative constant for a given j, whereas x ■ gi(t) < — logx, for i ^ j. 
Depending on the order of approximation required, other situations may 
need to be considered. The algorithms presented for the present context can 
be generalized in a straightforward way in order to deal with other situations 
as well. Some care must be taken with the difference 5 n (t + y/x) — 6 n (t) and 
for the remainder r n (t). 

Let us start with the maximum of the local field. Using the notation of 
Section 4.5 we will consider a function of the form f(y) = q{y) +r(y), where 
q(y) is the quadratic dominant part, and r(y) is the remainder. Dropping 
the index j, we represent the constraint function in the form 

xg(t + y/x) = x • g(t) + (g{t),y) + x ■ v(t, y) = g + (g, y) + v(y), 

for an appropriate remainder function v. Note, that the linear part is domi- 
nant over the remainder part. Since the target function / is asymptotically 
concave, it follows that the constraint is active, unless the global maximum 
is in the interior of Vt H T. The global maximum Mq was approximated in 
the previous section. What is remains is to compute the maximum when the 
constraint function is active. The Sequential Quadratic Programming (SQP) 
algorithm (see [8]) is a natural generalization of the quasi Newton-Raphson 
algorithm from the previous section to the case where the constraint is ac- 
tive. The Lagrangian is maximized by an iterative refinement of its pair of 
arguments (y, A). The SQP algorithm applies the recursive formula: 

( Vk+i \ _ ( Vk \ _ H -i ( r{Vk) + hv(Vk) 



\Xk+iJ V-W v v (y. 



where 

'KHt) o 



-H~ 





i (A-Ht)[9®9']KHt) KHt)g 



" (g, An 1 (t)g) \ g'KHt) 1 

Starting at the (d + l)-origin (0,0), under appropriate regularity conditions, 
one can show that f{yu) approximates the constraint maximum up to an 
order of magnitude of l/x k . 
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The term Sq was approximated in the previous section by the expectation 
of polynomials of a Gaussian random vector Y with mean vector y\ and 
covariance matrix A~ 1 (t). In the presence of a boundary, the expectation 
should be replaced by the expectation over the event {g + g'Y + v(Y) < 0}. 
Below we indicate how one can approximate such expectations. 

Observe that the vector Y may be decomposed into two orthogonal (hence 
independent) components: g'Y = Y\.~ N{g'y\, (g, A~ 1 (t)<7)) and [I—A~ l (t)[g( 
g]/{g, A~ 1 (t)g)]y = Yi- The polynomials can then be reformulated in terms 
of polynomials in Y\ and Yi and the event can be represented in the form: 
{Yi < —g — v(Y\,Y2)}. This event can be approximated by the event {Y\ < 
v(g,Y2)}, for v formed by collection of all terms of the appropriate order 
from the iterative application of the function G(z±) = —z\ — v (z\, Y2), start- 
ing at the point z\ = g. The following step is the computation of the expec- 
tation with respect to Y\. One can use the recursion tpj(y) = —y-'~ 1 4>(y) + 
(j — l)ipj-2(y), where tpj(y) = z^4>(z) dz, <f> the standard normal density. 
The recursion is initiated by ipo(y) = &(y), the normal c.d.f. function, and 
ipi(y) = —<j>(y). Finally, after a Taylor expansion of the outcome, an expec- 
tation is taken with respect to Y%- This involves expectation of a Gaussian 
polynomial, and may be carried out with the aid of Wick's formula. 

4.5.2. A detailed example. Let us consider the exact form of the expan- 
sion for a = 1, and x = o(n 1 ^ 4 ). Putting together the previous arguments we 
have a second order expansion of the form: 



x *-\2*)- A l*<l>{x) j_e- 5 ^\K n {t)\ 1/2 

(4.31) 



T 

r _ _ 1 ~ i 

dt. 



1 - r 2 n (t)/2a 2 n (t) + -E t (ai) + E*(ai) 
x 



The expectation Et here is with respect to the conditional distribution of N 
given {Z(t) = x}. The coemcient a\ is given by 

a 1 =E y (y,j n (t)) + (p,S- 1 ^)(l-||A- 1 /2 Wf 7(i)||2 /2)/CT 2 (t) 

- [(E Y g 2 (Y,N) ~ ihiviM + (®Yh 3 (Y,/3) - h 3 ( yi ,p))}. 

It can be shown that all terms in the definition of a± but the first one are 
centered variables. Moreover, an expectation of (Y,5 n (t)) with respect to Py, 
followed by an expectation with respect to Pj shows that this term vanishes 
as well, so E t (ai) = 0. 

For the boundary we have a parallel result with regard to a\ : 



#y - (gt,yi) \ 2 



y\\g(t)\\-(g(t), yi ) \ _ " 
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where A = {(g(t),yi) < y\\g(t)\\}. Integration along the boundary of T pro- 
duces, after some manipulations, 



To summarize, we obtain: 

Theorem 4.9. Let {Z n (t);t £ T} be a random field given, marginally, 
by (1.4). Assume that T is a compact, convex subset of the d-dimensional 
Euclidean space. Let the conditions of Theorem 4.1 with a = 1 hold. As- 
sume further that 6 U {-), and ip u (') are bounded, as well as their second and 
forth order derivatives, respectively. Finally, assume that the function g, 
which defines the boundary of T , is piecewise continuously differentiable. 
Let x — > oo,n — > oo be such that x = o(n 1//4 ). Then P(sup ig ^ Z n (t) > x) is 
approximated by (1.6). For x = o(n 1 / 5 ) the term r^(t) / 2a^(t) can be ne- 



5. Extensions. In this work we presented a second order approximation, 
which accounts for edge effects, for a specific class of random fields. At most 
locations it was indicated in the proofs how one may carry out a higher 
order approximation by including more terms in the Taylor expansion of 
the local field and by using higher order asymptotic expansions (for contin- 
uous distributions). The product is a functional of these derivatives under a 
Gaussian joint distribution, which needs to be evaluated. Some details, such 
as the appropriate extension of the Mill's ratio expansion, have been omit- 
ted. Additional work would also be required in order to obtain higher order 
boundary corrections, which take into account curvature of the boundary 
and points of nondifferentiability. Although higher order expansions may 
produce better approximations, it is likely that they will be very complex 
and not provide additional insight. In addition, numerical examples in the 
Gaussian case suggest that the two term approximation is frequently rea- 
sonably accurate. 

The methods we have developed are quite general. The "moving aver- 
age" representation we have assumed for Xt has much the same effect that 
moving average representations have long assumed in time series analysis. 
It allows us to derive by calculation detailed estimates needed in our argu- 
ments, especially our use of versions of a local central limit theorem to prove 





2(2vr)( d - 1 )/ 2 




glected. 
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(4.23). It is easy to identify places in our argument, where one could rewrite 
the result of a calculation as an assumption to provide an approximation 
for a more general class of random fields, which then could be specialized to 
subclasses by checking the assumptions. 

For the well developed case of a Gaussian random field {Z(t):t £ T}, 
with components that are standard normal, several technical conditions and 
aspects of the proof can be significantly improved and simplified. See [9] for 
details. 

An interesting generalization is random fields, even Gaussian random 
fields, with both smooth and nonsmooth components. Such fields arise nat- 
urally in the monitoring of images over time, so time is another component 
that should be added to the parameter space. At each given point in time 
the score may vary smoothly as the function of the structure of the signal. 
However, if a signal may abruptly appear, then the score will not be smooth 
in the direction of the time component. The advantage of our method, which 
was originally developed in nonsmooth settings, is its flexibility. The essential 
argument is blind with respect to issues of smoothness, and the calculations 
can be carried out in a unified manner. Only in the detailed investigation of 
the local field does the level of smoothness become important. 



Proof of Lemma 4.3. Let B = B(r) and assume that y £ [— r, r] max- 
imizes h. The one-dimensional and multi-dimensional cases are dealt sepa- 
rately. Consider first the case d = 1. If y = r then h(y) — h(y) > H{y — y), 
which gives 



The last inequality is verified by recalling the elementary inequality y/(l — 
e~ v ) < y + 1, which holds for every y > 0. The other extremal situation, 
y = —r, is exactly the same. One only has to consider the steepest negative 
slope —H instead of H. If \y\ < r we use them both to bound h(y) — h(y) 
from below by —H\y — y\. The integral is then evaluated over [— r, y] for 
positive y, and over [y,r] for negative y. Consequently, a lower bound for 
interior points, (H + 1/r) , is achieved, which together with the right-hand 
side of (A.l) completes the proof for the one-dimensional case. 

The multi-dimensional case is again divided into two scenarios. One is 
when y lies on the boundary and the other is when y is an interior point. 
We first treat boundary points ||y|| = r. To begin with, assume that the 




APPENDIX: MORE PROOFS 



(A.l) 




H- 1 {l-e- 2rH )>(H + l/2r) 
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projections of y onto each plane Hi — Uj, j, is the bisector of the axes 
yi,Uj and further, that tji > 0, for 1 < % < d. That is, assume that the jji's 
are positive and that y± = fa = • ■ ' = Vd, with a common value which must 
be equal to rdr 1 ! 2 . We refer to this y as "simple." For every y G B there 
exists, by the mean-value theorem, a point r/ on the line segment joining y 
and y, such that h{y) — h{y) = (y — y)'h(rj). Now, denote by Q = Ilf=i Q% 
the (d-dimensional) cube inscribed in B having sides 2rd -1 / 2 . Therefore, 

(A.2) / e Mv)-fc0) > FT / eto-^kfo) 

By a reduction to the one-dimensional case it is clear that every univariate 
integral above is in the form considered there for boundary points, with 
r replaced by rd~ l l 2 . Since Hi = max zG £ < H for all 1 < i < d, we 

obtain a lower bound (H + l/2rd~ 1 / 2 )~ d . The general case of a boundary 
point y, which is not in the form depicted above, is handled similarly except 
that the axes should be rotated first, to make y "simple." Denote by T 
the corresponding rotation matrix. Since T is orthogonal norm preserving 
transformation the bound changes only slightly, and is given in the form 
indicated by the lemma. 

Last, let y be an interior point. Assume, with no loss of generality, that y is 
simple (and also yi < rd" 1 ^ 2 ). This is clearly enough by the above argument. 
We repeat the scheme that yielded relation (A.2). This time each univariate 
integral is evaluated over the subinterval [— rd -1 / 2 , yi\. This is akin to the 
one-dimensional case with rd~ 1 / 2 substituted for r, and thus, the requested 
bound is obtained. □ 

Proof of Lemma 4.8. Consider first the case a 2 = 1. The general case 
will follow by substituting ax for x, fj,/a for fj>, and y/a for y. The log- 
likelihood ratio associated with transforming from mean fi to mean y is 
(fj, - y)Y - (fi 2 - y 2 )/2. Therefore, 

E^{e~ xY ; Y >y) = E y (e- {x+y -^ iY - y) ; Y -y>0)x 6 -^-(2/-m) 2 /2_ 

The random variable Y — y has a standard normal distribution under the 
distribution measure ¥ y . Its density satisfies: 

z 2n+2 

~ 2 n + 1 (n + 1)!' 

The approximation and bound on the error are obtained by integration. □ 
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